Statistical Optimisation & Data Assimilation

  • Static version of these slides available at: jrper.github.io/rv/da.html

j.percival@imperial.ac.uk

RSM 4.94

Learning objectives for today

  • Summarize core probability theory
  • Describe randomness in scientific computation
  • Identify & implement key data assimilation techniques
  • Describe their key properties & differences

Numerical weather prediction (NWP)

Estimate the future weather using mathematical models on a computer.

  • Big user of HPC/supercomputing equipment
  • Solves massive, memory bounded problems
  • Key thing for the British to talk about

Why study weather forecasting?

  • Knowledge of future weather is valuable.
  • Weather is somewhat predicable.
  • Weather is not totally predictable.

Weather Prediction Models

Weather models solve (somewhat) simplified fluid equations for large scale fluids on a rotating sperical(ish) Earth (eg. the "Primative equations")

$$ \frac{\partial \rho \mathbf{v}}{\partial t} +\nabla_H \cdot \rho \mathbf{vv} +\frac{\partial}{\partial z} (w\mathbf{v}) +2\rho\Omega\times\mathbf{v} =-\nabla p,$$

$$\frac{\partial p}{\partial z} = -\rho g,$$ $$\frac{\partial w}{\partial z} = -\nabla_H \cdot \mathbf{v},$$

Models throw away a lot of small, difficult stuff, e.g.

  • No/damped sound waves!
  • No/filtered "gravity waves".
  • No true boundary layers (surfaces are "rough").
  • Clouds are bulk quantities, not made of droplets.

Definitely not exact!

Still massive systems, still need initial conditions

Tendency to build & run nested models:

  • Global: "low resolution" (e.g. >10 km sometimes solved using spectral methods)
  • Regional (e.g. UK, Europe): High (>1 km) resolution finite volume/finite difference model
  • storm scale/ event scale: Very high (> 100 m) resolution, not operational.

nest.jpg

Gadian et al. 2017

ECMWF

Screenshot_2020-03-13%20ECMWF.jpg

UK Met Office

Screenshot_2020-03-13%20Up%20to%20%C2%A31%202billion%20for%20weather%20and%20climate%20supercomputer.png

In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
make_plots= False

Data Assimilation blends numerical models & observations

From a blog post by a colleague in the mathematics department

Basic Data Assimilation isn't machine learning (probably)

Data assimilation:

  • Converts limited observations (data) into algorithmic (numerical) model inputs.
  • Takes a probabalistic/statistical view of the universe.
  • Mostly believes in its own model physics
  • Is "old" (since 1970s)

Data Assimilation isn't machine learning (probably)

Machine learning:

  • Use lots of data (observations) to create a statistical model algorithmically.
  • Takes a probabilistic/statistical view of the universe.
  • Doesn't know physics a priori.
  • Is "new".

There is a lot of cross-pollination though.

Plan for this session

  • First section: Recap of Probability Theory from CM.
  • Second section Randomness & Computers (or why C rand is kinda bad)
  • Third section: Optimal Statistical Interpolation in space
  • Fourth section: Assimilating across time

Afternoon: Exercises (& first assessment release)

  • Your turn to try things.

Data Assimilation isn't machine learning (probably)

Data assimilation:

  • Converts limited observations (data) into algorithmic (numerical) model inputs.
  • Takes a probabalistic view of the universe.
  • believes in its model physics
  • Is "old".

Data Assimilation isn't machine learning (probably)

(Deep) machine learning

  • Use lots of data (observations) to create a statistical model algorithmicallly.
  • Takes a probabalistic view of the universe.
  • Don't know physics
  • Is "new".

There's a lot of cross-pollination though.

Probability Theory Recap

game-dice-gamble-33968.jpg

Probability

  • "Chance" that discrete random event$X$ is described by label $x$ is $P(X=x)$.
  • For a continuous random event "chance" $X$ is in interval $S$ is $P(X\in S)$
  • If we're lucky, $P(X\in S)$ = $\int_{S}p(x) dx.$
  • We often choose to be lucky.
  • Some famous distributions:
    • Uniform on interval [a, b] $$p(x;a, b) = \begin{cases} \frac{1}{b-a} & a\leq x < b\\0& \mbox{otherwise}\end{cases}$$
    • Gaussian/Normal with mean $\mu$, s.d $\sigma$ $$ p(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{(x-\mu)^2}{2\sigma^2})$$
    • logistic with mean $\mu$, s.d $\sigma$ $$p(x; \mu, \sigma) = \frac{1}{{4\sigma}}\mbox{sech}^2 \left(\frac{x-\mu}{2\sigma}\right)$$

Exercises with answers with notes.

In [2]:
x = np.linspace(-3, 3, 100)

def uniform(x, a, b):
    return np.where((x>=a)*(x<b), 1/(b-a), 0)

def normal (x, mu, sigma):
    return np.exp(-0.5*(x-mu)**2/sigma**2)/(np.sqrt(2*np.pi)*sigma)

def logistic(x, mu, sigma):
    return 1/np.cosh((x-mu)/(2*sigma))**2/(4*sigma)
In [3]:
if make_plots or False:
    plt.title('PDFs')
    plt.plot(x, uniform(x, -0.5, 1.5), label='Uniform(-0.5, 1.5)')
    plt.plot(x, normal(x, 0, 1), label='Normal(0,1)')
    plt.plot(x, logistic(x, 0, 1), label='Logistic(0,1)')
    plt.xlabel(r'$x$')
    plt.ylabel(r'$p(x)$')
    plt.legend()
    plt.savefig('images/PDFs.png')

Likelihood

  • Can also view PDFs as functions of a fixed outcomey, $x$ & variable parameter(s):
    • Uniform (lower bound) $$l(a; x, b) = \begin{cases}\frac{1}{b-a} & a < x\\0& \mbox{otherwise}\end{cases}$$
    • Gaussian/Normal (mean), $$ p(\mu; x, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
    • logistic (mean) $$p(\mu; x, \sigma) = \frac{1}{{4\sigma}}\mbox{sech}^2 \left(\frac{x-\mu}{2\sigma}\right)$$
In [4]:
theta = np.linspace(-3, 3)
x0=0

if make_plots or False:
    plt.title('likelihoods (for $x=0$)')
    plt.plot(theta, uniform(x0, theta, 1.5), label='Uniform - a, with b=1.5')
    plt.plot(theta, normal(x0, theta, 1), label='Normal - mean')
    plt.plot(theta, logistic(x0, theta, 1), label='Logistic - mean')
    plt.xlabel(r'$\theta$')
    plt.ylabel(r'$l(\theta)$')
    plt.legend()
    
    plt.savefig('images/likelihoods.png')

  • Sum of Gaussians distributions is Gaussian distributed $$p_{X+Y} = \frac{1}{\sqrt{2\pi(\sigma_X^2+\sigma_Y^2)}}\exp \left( -\frac{(x-\mu_X-\mu_Y)^2}{2(\sigma^2_X+\sigma^2_Y)} \right) $$
  • Not true for other distributions!
  • Not true for multiplication/nonlinear functions

Multivariate Gaussian

  • Consider vector, $\mathbf{z}$ of independent $\mathcal{N}(0,1)$ normal distributions.
  • Joint distribution $$p(\mathbf{z}) = \frac{1}{\sqrt{2\pi}}\exp\left(\frac{\|\mathbf{z}\|^2}{2}\right) $$
$$ P(Z_1\leq z_1, \ldots, Z_n\leq z_n) = \int_{-\infty}^{z_1} \ldots \int_{-\infty}^{z_n} p(\mathbf{z}) dz_n \ldots dz_1 $$

.

  • More generally, thanks to summation/variance rules, have a formula given mean & covariance matrix $$p(\mathbf{z}) = \frac{1}{\sqrt{2\pi\det{\mathbf{S}}}}\exp\left(\frac{(\mathbf{z}-\mathbf{\mu})^T\mathbf{S}^{-1}(\mathbf{z}-\mathbf{\mu})}{2}\right) $$

What does chance mean anyway?

Classical Probability

Classical probability is ratio of successes to cases, where all cases "equally likely":

$$ \mbox{possible results} = \{1, 2, 3, 4, 5, 6\} $$$$ \mbox{results divisible by 3} = \{3, 6\}$$$$ P(\mbox{result divisible by 3}) = \frac{2}{6} \approx 0.3333 $$
In [5]:
# Pythons standard random number library
import random
def d6():
    """A Python implementation of a six sized dice"""
    # We could also use numpy's np.random.randint(1, 7)
    # which will vectorize better.
    # note the different ways to set the endpoint
    return random.randint(1,6)

[d6() for _ in range(10)]
Out[5]:
[3, 6, 5, 5, 4, 2, 3, 1, 6, 5]
In [6]:
# Run a simple statistical tests on our d6
n1 = 1000
def sample():
    success = 0
    for i in range(n1):
        # successful trial divides by 3 
        success += (d6()%3 == 0)
    # return the ratio of successes to failures
    return success/n1
In [7]:
plt.figure(figsize=(20,8))
n2 = 1000
plt.hist([sample() for _ in range(n2)], density=True);
x = np.linspace(0.28,0.38)
plt.plot(x, normal(x, 1/3, np.sqrt((1/6)*(5/6)/n1)))
Out[7]:
[<matplotlib.lines.Line2D at 0x2cc2a693fa0>]

Frequestist probability

  • Frequentist probability is the limit of many trials.
  • explains what you'd expect to see "on average".
In [8]:
def f_n(N):
    return [sum([d6()%3==0 for _ in range(n)])/n for n in N]

x_vals = [2 **i for i in range(1,20)]
y_vals = f_n(x_vals)

plt.plot(x_vals, y_vals)
plt.hlines(1/3, x_vals[0], x_vals[-1], 'k', ls=':')
plt.xscale('log')
y_vals[-1]
Out[8]:
0.3328723907470703

Bayesian Probability

  • Bayesian probability is a statement of belief
  • Can be defined for one event
  • Can be different for different observers
  • Liable to update with new information

Bayes Theorm:

$$ P(A| B) = \frac {P(B | A) P( A)}{P(B)} $$

Can use this as a formula to "gain knowledge". $$\tag{Frequentist/Bayesian} P(\mbox{Is American} | \mbox{prefers coffee to tea}) = \frac{P(\mbox{prefers coffee to tea} | \mbox{Is American})P(\mbox{Is American})}{P(\mbox{Prefers coffee to tea})} $$ $$ \frac{200}{227} = \frac{\frac{2}{3}\frac{5}{6}}{\frac{2}{3}\frac{5}{6}+\frac{3}{8}\frac{1}{6}}$$ $$\tag{Bayesian} \mbox{Posterior estimate} \propto \mbox{Likelihood}\times \mbox{Prior estimate} $$ $$\tag{Bayesian} p(\theta|\mbox{new data}) \propto p(\mbox{new data}|\theta)p(\theta| \mbox{previous knowledge}) $$

Here normalisation factor (called marginal likelihood/evidence) is $$E(f(X)) := \int_{\mbox{possible values of }\theta} p(\mbox{new data}|\theta) p(\theta) d\theta$$

same format as expected value (or expectation) operator, $E$

$$E(f(X)) := \int_{\mbox{possible values of }x} f(x) p(x) dx$$

Mean, variance, median and mode

$$\tag{mean}\mu_X:=E(X)$$$$E(\mbox{6 sided dice}) := \sum_{x=1}^6 x \frac{1}{6}=3.5$$$$\tag{variance}\sigma^2_X := E((X-\mu)^2)$$$$\quad\qquad=E(X^2)-E(X)^2$$$$\tag{median (1d only)} x \mbox { s.t. } P(X\leq x)=0.5$$$$\tag{mode} \mbox{argmax} p(x)$$

Some properties of the expectation operator

Fixed values stay fixed $$E(10) = 10$$ Linear in scalar multiplication $$E(3X) = 3E(X)$$ distributes over addition $$ E(X+Y) = E(X)+E(Y) $$

Full proofs surprisingly involved (and we wouldn't expect you to write them down from scratch). E.g.

$$ E(X+Y) = \sum_{x,y} (x+y)P(X=x\cap Y=y) $$$$\qquad \qquad \qquad \qquad \qquad= \sum_{x,y} x P(X=x\cap Y=Y)+\sum_{x,y} y P(Y=x\cap X=x) $$$$\sum_{x,y} x P(X=x\cap Y=Y)=\sum_{x,y} xP(x)(Y=y\vert X=x) $$$$=\sum_x \left[xP(x)\underbrace{\sum_y(P(y\vert x)}_{=1}\right] =\sum_x xP(x)=E(X)$$
In [9]:
import numpy as np
n_samples = 100
mu = []
for _ in range(1000):
    data = np.array([d6() for i in range(n_samples)])
    mu.append(np.sum(data)/n_samples) # or just np.mean()
plt.figure(figsize=(20, 8))
plt.vlines(3.5, 0, 3, 'k')
plt.hist(mu, density=True);

Estimators

Rule to give guess at a parameter of random data based on observed data. Eg.

$$\frac{1}{n}\sum_{i=1}^n x_i \approx \mu_X$$

.

Estimator $\hat{\xi}$ of property $\xi$ is unbiased if

$$E(\hat\xi)=E(\xi)$$

i.e. it's right "on average".

$$E\left(\frac{1}{n}\sum_{i=1}^n x_i\right) = \frac{1}{n}\sum_{i=1}^n E(x_i) = \mu_X$$

.

Similar argument gives, $$\frac{\sum_{i=1}^n \left(x_i-\frac{\sum_{j=1}^n x_j}{n}\right)^2}{n}\approx \sigma^2_X$$

but $$E\left(\frac{\sum_{i=1}^n \left(x_i-\frac{\sum_{j=1}^n x_j}{n}\right)^2}{n}-\sigma^2_X\right)=\frac{\sigma^2_X}{n}$$ Estimator is biased.

a "better" (i.e unbiased) estimate for variance is $$\frac{\sum_{i=1}^n \left(x_i-\frac{\sum_{j=1}^n x_j}{n}\right)^2}{n-1}$$ although for some distributions this will have a higher average error than the biased version (just not so "one-sided").

  • Maximum likelihood estimator is estimator that maximizes likehood for given $x$.
In [10]:
def compare_bias():
    var_biased = []
    var_unbiased = []
    n_samples = 100
    for _ in range(1000):
        data = np.array([d6() for i in range(n_samples)])
        var_biased.append((np.sum((data-mu[-1])**2)/n_samples)) # or np.var(a)
        var_unbiased.append((np.sum((data-mu[-1])**2)/(n_samples-1))) # or np.var(a, ddof=1)
    return var_biased, var_unbiased
In [11]:
biased, unbiased = compare_bias()

plt.hist(biased, density=True, alpha=0.5, label='biased')
plt.hist(unbiased, density=True, alpha=0.5, label='unbiased');
plt.legend()
plt.vlines(35/12, 0, 2);

Independence, correlation and covariance

Given two random variables $X$ & $Y$, we can define a joint (vector) probability

$$ P(X=x\mbox{ and }Y=y) $$

The vectors are independent if for all $x, y$ $$ P(X=x\mbox{ and }Y=y) = P(X=x)P(Y=y)$$

Let $E(X)=\mu_X$, $E(Y)=\mu_Y$, can define covariance $$\mbox{cov}(X,Y) = E((X-\mu_X)(Y-\mu_y) = E(XY)-\mu_X\mu_Y$$ $$0\leq\mbox{cov}(X,Y)^2\leq\sigma_X^2\sigma_Y^2$$

Can also define correlation $$ \mbox{cor}(X,Y) = \frac{\mbox{cov}(X,Y)}{\sigma_X\sigma_Y} $$ Hence $$ -1 \leq \mbox{cor}(X,Y) \leq 1 $$

Part II: Random integers on a computer

Two (maybe 3?) kinds of "randoms" generated by computers

  1. Pseudo-random/deterministic (PRNGs)
  2. Hardware/True random (TRNGs)
  3. Cryprographically secure pseudo-random (CSPRNG)

PRNGS

  • Each "random" number in sequence derived by a rule from previous one(s).
  • Hard to guess individually, but not "surprising"
  • "Cheap" to generate
  • Reproducable (given seed)
  • given enough values, might guess the next
In [12]:
# A (bad) example
class Randu:
    """Never EVER use this for anything non-trivial"""
    
    def __init__(self, seed=1):
        
        self._I = seed
        
    def __call__(self):
        self._I = (65539*self._I)%(2**31)
        return self._I
 
randu = Randu(1)

print([randu() for _ in range(5)])
randu = Randu(2)
print([randu() for _ in range(5)])
[65539, 393225, 1769499, 7077969, 26542323]
[131078, 786450, 3538998, 14155938, 53084646]
In [13]:
randu_data = [randu() for _ in range(2**11)]
# randint uses https://en.wikipedia.org/wiki/Mersenne_Twister
random_data = [random.randint(0,2**31-1) for _ in range(2**11)]
plt.hist(randu_data, 32, alpha=0.5, label='Randu')
plt.hist(random_data, 32, alpha=0.5, label='randint')
plt.legend();

Similar to code used in Windows C/C++ rand implementation

In [14]:
class msvc_rand():
    def __init__(self, seed=1):
        
        self._I = seed
        
    def __call__(self):
        self._I = self._I*214013 + 2531011
        return (self._I) >> 16 & 0x7fff 
    
rand = msvc_rand()

print(rand())
print(rand())
 
41
18467
  • Both examples of "linear congruential generators" (LCGs).
  • These are fast methods
  • Come with problems (in statistical tests)
  • randu is BAD. Windows rand is "not good". Linux rand may or may not meet current acceptable standards (default does)

Good rules of thumb

  • Explicit is better than implicit.
  • Know (and state) what PRNG you are using (e.g. library in C++).
  • Keep a copy of the seed.
  • All of this gets weirder in parallel

TRNGs

  • Use external physical/hardware source of randomness (often as seed of PRNG)
  • Can't be guessed based on previous data
  • High cost to implement
  • Not reproducable
  • Useful for cryptography/security

CSPRNGs

  • Use TRNG to (re)seed goood PRNG on initialization
  • Unlikely to be guessed based on previous data
  • Middling cost to implement
  • Not reproducable
  • Still good for cryptography/security
In [15]:
### An example interaction with a CSPRNG
import os
b = os.urandom(4)
print(b)
# Here "big" means https://en.wikipedia.org/wiki/Endianness#Big-endian
int.from_bytes(b, 'big')
b'!\xe2\xa6\xd7'
Out[15]:
568501975

Random reals on a computer

  • The heart of an RNG is almost always a generator for uniform integers
  • Usually also between 0 and some $N_\max$ (often in form $2^N$).
  • To get other distributions we must do extra mathematics
In [16]:
class RealRandu(Randu):
    """Real uniform random number generator based on RANDU."""

    def __call__(self):
        super().__call__();
        return self._I/(2**31)
    
randr = RealRandu()
data = [randr() for _ in range(9999)]
plt.hist(data, density=True);
In [17]:
### Now for the true evil of Randu!
from mpl_toolkits.mplot3d import Axes3D
rand = Randu(1)
data = [rand() for _ in range(18000)]
data = np.array(data).reshape((len(data)//3, 3)).T
fig = plt.figure(figsize=(9, 9))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(*data)

ax.azim = -134; ax.elev =38
In [18]:
# What about Gaussians?

N = 100000
a = np.random.uniform(size=N)
b = np.random.uniform(size=N)

plt.hist(a, density=True, alpha=0.4)
plt.hist(b, density=True, alpha=0.4)
plt.hlines(1.0, 0.0, 1.0);
In [19]:
#A python demonstration of the Box-Muller algorithm

def box_muller(a, b):
    r = np.sqrt(-2*np.log(a))
    theta = 2*np.pi*b
    return r*np.sin(theta), r*np.cos(theta)
n1, n2 = box_muller(a, b)
plt.hist(n1, density=True, alpha=0.4)
plt.hist(n2, density=True, alpha=0.4)
x = np.linspace(-4, 4)
def p_x(x):
    return 1/np.sqrt(2*np.pi)*np.exp(-x**2/2)
plt.plot(x, p_x(x));

Part III : Data Assimilation: a good reference

kalnay.jpg

Eugenia Kalnay - Atmospheric modeling, data assimilation and predicability

The Language of data assimilation

A paper (Ide et al. 1997) sets out a standard notation:

$$ \mathbf{x} : \mbox {vector of state variables (Stephan's $u$)} $$$$ \mathbf{y} : \mbox{vector of observations (data)}$$

States

$$ \mathbf{x}_b : \mbox {Background state (first guess)}$$$$ \mathbf{x}_a : \mbox {Analysis state (final guess)}$$$$ \mathbf{x}_t : \mbox {True state}$$

Observations & models

$$ \mathbf{H}, \mathbf{h}: \mbox{observation operator (takes $\mathbf{x}$ to $\mathbf{y}$)} $$$$ \mathbf{M}, \mathbf{m}: \mbox{forward model (not quite Stephan's $g$)}$$

Errors

$$ \mathbf{e}_b:= \mathbf{x}_b-\mathbf{x}_t : \mbox {Background error}$$$$ \mathbf{e}_a:= \mathbf{x}_a-\mathbf{x}_t : \mbox {Analysis error}$$$$ \mathbf{e}_o:= \mathbf{y}-\mathbf{h}(\mathbf{x}_t) : \mbox {Observation error}$$

Error covariance matrixes

$$ \mathbf{B}:=E(\mathbf{e}_b\mathbf{e}_b^T): \mbox{Background}$$$$ \mathbf{P}:=E(\mathbf{e}_a\mathbf{e}_a^T): \mbox{Analysis}$$$$ \mathbf{R}:=E(\mathbf{e}_o\mathbf{e}_o^T):\mbox{Observations}$$

Nudging: the first DA method

Numerics normally updates $\mathbf{x}$ through differential/difference equation $$\frac{\partial \mathbf{x}}{\partial t} = \mathbf{m}(\mathbf{x})$$

q.v first part of Stephan's $g$.

State estimation versus parameter estimation

  • Data assimilation draws distinction between model state, which changes with time and model parameters, which are assumed fixed for each run.
  • A state estimate is needed :
    1. to know where we are/were
    2. as ICs to run model forward
  • Stephan basically covered parameter estimation yesterday (without the probability).

Nudging

Given direct observation, $\mathbf{y}$ update via fictional force

$$\frac{\partial \mathbf{x}}{\partial t} = \mathbf{m}(\mathbf{x})\color{red}{+\frac{\alpha}{\Delta t}(\mathbf{y}-\mathbf{x})}$$

not a brilliant method:

  • only works with direct observations
  • fake heating drives spurious circulations
  • point updates only.

    Lets get statistical

Statistical interpolation

The time from two clocks

  • Suppose you have two clocks
    • First says it's 10:21 and 95% of the time its $\pm 2$ minutes
    • Second says it's 10:24 and 95% of the time its $\pm 1$ minutes
  • What's the time?

The BLUE

For quantative version, can seek Best Linear Unbiased Estimator

The BLUE

For quantative version, can seek Best Linear Unbiased Estimator

  • Estimator : fixed guess at statistical quantity

The BLUE

For quantative version, can seek Best Linear Unbiased Estimator

  • Estimator : fixed guess at statistical quantity
  • Unbiased : Error has zero mean

The BLUE

For quantative version, can seek Best Linear Unbiased Estimator

  • Estimator : fixed guess at statistical quantity
  • Unbiased : Error has zero mean
  • Linear: $T_\mbox{guess} = a T_1 + b T_2$

The BLUE

For quantative version, can seek Best Linear Unbiased Estimator

  • Estimator : fixed guess at statistical quantity
  • Unbiased : Error has zero mean
  • Linear: $T_\mbox{guess} = a T_1 + b T_2$
  • Best: minimise mean square error
$$ T_1 = T_t+e_1 $$$$ T_2 = T_t+e_2 $$$$ T_\mbox{guess} = (a+b) T_t + a e_1 +b e_2 $$$$ e_\mbox{guess} := T_\mbox{guess}-T_t = (a+b-1)T_t + ae_1+be_2 $$
$$ E(e_\mbox{guess}) := (a+b-1)T_t + aE(e_1)+bE(e_2) $$
  • If $e_1$ & $e_2$ unbiased want $$ a+b = 1$$ for estimate to be unbiased - constraint.
  • If observations are also independent $$E(e^2_\mbox{guess}) = a^2E(e_1^2) +b^2 E(e_2^2)$$
  • Substitute for $b$ and variances: $$E(e^2_\mbox{guess} = a^2\sigma_1^2 +(1-a)^2 \sigma_2^2$$
  • Minimise w.r.t $a$ $$\frac{df}{da}= 2a\sigma_1^2 -2(1-a)\sigma_2^2 $$
  • i.e. $$ a = \frac{\sigma_2^2}{\sigma_1^2 +\sigma_2^2} $$ $$ T_\mbox{BLUE} = \frac{\sigma^2_2 T_1+\sigma_1^2 T_2}{\sigma^2_1+\sigma^2_2} $$
In [20]:
# We'll make the truth the zero state 
T_true = 0.0
T_1 = T_true + np.random.normal(0, scale=2, size=1000)
T_2 = T_true + np.random.uniform(-np.sqrt(3),np.sqrt(3), size=1000)
def T_guess(a):
    """Function to generate zero bias """
    return a*T_1+(1-a)*T_2
x=np.linspace(0,1)
e_guess = [np.mean((T_guess(_)-T_true)**2) for _ in x]

plt.plot(x, e_guess, label='mean error from 1000 trials')
plt.plot(1**2/(1**2+2**2),(1**2*2**2)/(1**2+2**2),'ko', label='Predicted minimum mean square error');
plt.xlabel('Weighting'); plt.ylabel('Mean square error'); plt.legend();

Optimal Interpolation

Now do a vector version

New information (aka innovation vector) is $\mathbf{y}-\mathbf{H}\mathbf{x}_b$.

Want new state $\mathbf{x}_a=\mathbf{x}_b+\mathbf{W}(\mathbf{y}-\mathbf{H}\mathbf{x}_b)$

Choose to minimise total analysis mean square error $$\tag{mean square error} \mathcal{J} = E(\mathbf{e}_a\cdot\mathbf{e}_a)$$

$$\mathbf{y} = \mathbf{H}\mathbf{x}_t+\mathbf{e}_o$$$$ \mathbf{x}_a-\mathbf{x}_t = \mathbf{x}_b-\mathbf{x}_t +\mathbf{W}(\mathbf{H}\mathbf{x}_t+\mathbf{e}_o-\mathbf{H}\mathbf{x}_b)$$

So $$ \mathbf{e}_a = \mathbf{e}_b +\mathbf{W}(\mathbf{e}_o-\mathbf{H}\mathbf{e}_b)$$ Normal equation $\frac{\mathcal{J}}{\mathbf{W}}$ $$E\left(\left[\mathbf{e}_b +\mathbf{W}(\mathbf{e}_o-\mathbf{H}\mathbf{e}_b)\right](\mathbf{e}_o-\mathbf{H}\mathbf{e}_b)^T\right)=\mathbf{ 0}.$$ Assume errors are unbiased and no correlations between $\mathbf{e}_b$ and $\mathbf{e}_o$. $$-E(\mathbf{e}_b\mathbf{e}_b^T)\mathbf{H}^T+\mathbf{W}\left(E(\mathbf{e}_o\mathbf{e}_o^T)+\mathbf{H}E(\mathbf{e}_b\mathbf{e}_b^T)\mathbf{H}^T\right)=\mathbf{0}$$

$$-\mathbf{B}\mathbf{H}^T+\mathbf{W}\left(\mathbf{R}+\mathbf{H}\mathbf{B}\mathbf{H}^T\right)=\mathbf{0}$$
  • Optimal (Kalman) gain $$\mathbf{W}=\mathbf{B}\mathbf{H}^T\left(\mathbf{R}+\mathbf{H}\mathbf{B}\mathbf{H}^T\right)^{-1}$$
  • Hence optimal analysis is $$\mathbf{x}_a=\mathbf{x}_b+\mathbf{B}\mathbf{H}^T\left(\mathbf{R}+\mathbf{H}\mathbf{B}\mathbf{H}^T\right)^{-1}(\mathbf{y}-\mathbf{H}\mathbf{x}_b)$$
In [21]:
### Example optimal interpolation implementation & solution
from scipy.linalg import inv
### define the standard deviation of the background and observations

sigma_t = 1.0; sigma_b = 1.0; sigma_r = 1.0
l_t = 0.2; l_e = 0.1; l_b = l_e
s = np.linspace(0, np.pi)

e_b = np.zeros(len(s));  x_t = np.zeros(len(s))
for _ in range(len(s)):
    e_b += np.random.normal(0, sigma_b)*np.exp(-(s-s[_])**2/l_e**2)
    x_t += np.random.normal(0, sigma_t)*np.exp(-(s-s[_])**2/l_t**2)
x_b = x_t + e_b
In [22]:
H = np.zeros((len(s)//5, len(s)))
for _ in range(H.shape[0]):
    H[_,5*_] = 1
    
y = np.dot(H, x_t) 
y += np.random.normal(0, 1, size=(y.shape))
R = sigma_r**2*np.eye(y.shape[0])

s2 = np.broadcast_to(s, (len(s), len(s)))
## Here we calculate B from known statistics 
B = sigma_b**2*np.exp(-(s2-s2.T)**2/l_b**2)

W = B.dot((H.T)).dot(inv(R+H.dot(B.dot(H.T))))
x_a = x_b + W.dot(y-H.dot(x_b))
In [23]:
plt.figure(figsize=(15, 6))
plt.plot(s, x_t, 'k', label='$x_t$, true state')
plt.plot(s, x_b, 'b', label='$x_b$, background guess')
plt.scatter(s[::5], y, label='obervation')
plt.plot(s, x_a, 'r', label='$x_a$, final analysis')
plt.legend();

A note on $\mathbf{H}$

  • Observation operator projects state onto the observations
  • Says what we expect to see.
  • Often (linear) interpolation.
  • Eg. if $y_1$ is direct observation halfway between 1st and 2nd gridpoint, $$ H_{1i} = (0.5,0.5,0,\ldots,0)$$

Good things with Optimal Interpolation:

  • Can use single $\mathbf{B}$
  • If $\mathbf{x}$ is longer than $\mathbf{y}$, matrix to invert is probably (relatively) small. #### Bad things:
  • No time dependence
  • Linear $\mathbf{H}$s only. Can do better with more assumptions

Spatial Variational Assimilation: 3D-Var

  • Assume all errors are multivariate Gaussian
  • Assume observations and state errors are independent & uncorrelated
  • Look for highest posterior likelihood state (i.e centre of the bell curve)
  • Bayes' Theorem plus independece:
$$ p(\mathbf{x}\vert\mathbf{y},\mathbf{x}_b) = \frac{p(\mathbf{y}\vert\mathbf{x}, \mathbf{x}_b)p(\mathbf{x}|\mathbf{x}_b)}{P(\mathbf{y})} $$
$$p(\mathbf{x}\vert\mathbf{y}, \mathbf{x}_b) \propto \exp\left(-\frac{(\mathbf{y}-\mathbf{h}(\mathbf{x}))^T\mathbf{R}^{-1}(\mathbf{y}-\mathbf{h}(\mathbf{x})+(\mathbf{x}_b-\mathbf{x})^T\mathbf{B}^{-1}(\mathbf{x}_b-\mathbf{x}))}{2}\right) $$

Sufficient to minimize $$J(\mathbf{x}) = (\mathbf{x}_b-\mathbf{x})^T\mathbf{B}^{-1}(\mathbf{x}_b+\mathbf{x})+(\mathbf{y}-\mathbf{h}(\mathbf{x}))^T\mathbf{R}^{-1}(\mathbf{y}-\mathbf{h}(\mathbf{x}))$$

  • "log-likelihood"
  • Minimum at $\mathbf{x}_a$ (analysis).
In [24]:
# Example 3D-Var implementation and solution
# We will use some weather-like 2d data and generate the B from climatology.

nx = 26; ny = 11
Lx = 1e6; Ly = 4e5
U_0 =  30.0; radius = 5e4

def wind_field(X, Y, circulations, centres):
    U = np.full((ny, nx), U_0)
    V = np.zeros((ny, nx))
    for circ, (x, y) in zip(circulations, centres):
        r2= (X-x)**2 + (Y-y)**2
        u = circ/(2*np.pi)*np.where(r2>radius**2, 1./r2, 1.0/radius**2) 
        U -= (Y-y)*u
        V += (X-x)*u
    return U, V
X, Y = np.meshgrid(np.linspace(0,Lx,nx), np.linspace(0,Ly, ny))

def random_vortices(N, kx=5, ky=5):
    return (200*np.random.lognormal(0, 0.1, size=N)*radius,
            np.random.uniform([-kx*radius, -kx*radius], [Lx+ky*radius, Ly+ky*radius], (N, 2)))
In [25]:
def plot_wind(X, Y, u, v):
    plt.figure(figsize=(15,5))
    plt.quiver(X, Y, u, v, np.sqrt(u**2+v**2))
    plt.colorbar()
    plt.axis('equal')
U_t, V_t = wind_field(X, Y, *random_vortices(4, -1, -1))
plot_wind(X, Y, U_t, V_t);
In [26]:
# observation locations
n_full = 25
n_speed = 25
y_loc = np.random.randint(0, nx*ny, n_full+n_speed)
# observation values
y = np.empty(2*n_full+n_speed)
y[:n_full] = U_t.ravel()[y_loc[:n_full]] + np.random.normal(0, 2.0, n_full)
y[n_full:2*n_full] = V_t.ravel()[y_loc[:n_full]] + np.random.normal(0, 2.0, n_full)
y[2*n_full:] = (np.sqrt(U_t.ravel()[y_loc[n_full:]]**2
                      + V_t.ravel()[y_loc[n_full:]]**2)
                      + np.random.normal(0, 2, n_speed))
In [27]:
def h(x):
    hx = np.empty(2*n_full+n_speed)
    u = x[y_loc]
    v = x[ny*nx+y_loc]
    hx[:n_full] = u[:n_full] = u[:n_full]
    hx[n_full:2*n_full] = v[:n_full]
    hx[2*n_full:] = np.sqrt(u[n_full:]**2+v[n_full:]**2)
    
    return hx

R = 2.0**2*np.eye(2*n_full+n_speed)

plt.figure(figsize=(15,5))
plt.scatter(X.ravel()[y_loc[:n_full]], Y.ravel()[y_loc[:n_full]], c='r')
plt.scatter(X.ravel()[y_loc[n_full:]], Y.ravel()[y_loc[n_full:]], c='k')
plt.axis([0, Lx, 0, Ly]);
In [28]:
U = np.empty((5000,ny,nx)); V = np.empty((5000,ny,nx))
for _ in range(U.shape[0]):
    U[_, : :], V[_, :, :] = wind_field(X, Y, *random_vortices(4))
mu_u = np.mean(U, 0); mu_v = np.mean(V, 0)
plot_wind(X, Y, mu_u, mu_v)
In [29]:
d = np.empty((U.shape[0], 2*ny*nx))
for _ in range(d.shape[0]):
    d[_, :ny*nx] = (U[_, :]-mu_u).ravel() 
    d[_, ny*nx:] = (V[_, :]-mu_v).ravel()
B = np.empty((2*nx*ny, 2*nx*ny))
for i in range(2*nx*ny):
    for j in range(2*nx*ny):
        # We'll be good and use the unbiased (N-1) form of the covariance
        B[i, j] = np.sum(d[:, i]*d[:, j])/(U.shape[0]-1)
plt.figure(figsize=(15,5))
plt.pcolormesh(X, Y, B[ny//2*nx+nx//2, :ny*nx].reshape((ny,nx)), shading='auto')
plt.colorbar();
In [30]:
x_b = np.empty(2*ny*nx) 
x_b[:ny*nx] = mu_u.ravel(); x_b[ny*nx:] = mu_v.ravel()

Binv = inv(B); Rinv = inv(R)

def J(x):
    dx_b = x-x_b
    dx_o = y - h(x)
    return np.dot(dx_b, Binv.dot(dx_b))+np.dot(dx_o,Rinv.dot(dx_o))

from scipy.optimize import minimize
res = minimize(J, x_b, method='CG', tol = 1e-3, options={'maxiter':20})
x_a = res.x

U_a = x_a[:ny*nx].reshape((ny,nx))
V_a = x_a[ny*nx:].reshape((ny,nx))
In [31]:
# first-guess - climatography
plot_wind(X, Y, mu_u, mu_v)
In [32]:
# real state
plot_wind(X, Y, U_t, V_t)
In [33]:
# our analysis
plot_wind(X, Y, U_a, V_a)

Good things:

  • Can use single $\mathbf{B}$
  • Can apply standard minimization techniques
  • Allows nonlinear observations #### Bad things:
  • No time dependence
  • Iteration to nonlinear solution more costly

Two names, same outcome

If $\mathbf{h}(\mathbf{x})$ linear (i.e. $\mathbf{h}(\mathbf{x})=\mathbf{H}\mathbf{x}$)

then for 3D-Var

$$\mathbf{x}_a = \mathbf{x}_b + \left(\mathbf{B}^{-1}+\mathbf{H}^T\mathbf{R}^{-1}\mathbf{H}\right)^{-1}\mathbf{H}^T\mathbf{R}^{-1}[\mathbf{y}-\mathbf{H}\mathbf{x}_b] $$

Actually same as Optimal Interpolation!

In this case

$$J(\mathbf{e})=\|\mathbf{e}-\mathbf{e}_b\|^2_{\mathbf{B}}+\|\mathbf{H}\mathbf{e}-\mathbf{e}_o\|^2_{\mathbf{R}} $$

for norms

$$\|\mathbf{a}\|^2_{\mathbf{M}}=\mathbf{a}^T\mathbf{M}^{-1}\mathbf{a}$$

Can view as special case of regularization!

Part IV - Time in the assimilation process

OI.png

Not making full use of data!

The Kalman Filter

  • Consider a system with a known linear forward model $\mathbf{M}$ $$\mathbf{x}^{(k+1)}_t = \mathbf{M}\mathbf{x}^{(k)}_t + \mathbf{f},$$
  • At step $k$ forecast is $$\mathbf{x}^{(k+1)}_b = \mathbf{M}\mathbf{x}^{(k)}_a +\mathbf{f}_b,$$
  • error update equation is $$\mathbf{e}_b^{k+1} = \mathbf{M}\mathbf{e}^k_a +\mathbf{e}_f$$

where $\mathbf{e}_f := \mathbf{f}_b-\mathbf{f}$.

  • Error covariances go like $$\mathbf{B}^{k+1} = E(\mathbf{e}_b^{k+1}[\mathbf{e}_b^{k+1}]^T) $$ $$\qquad \qquad \qquad= \mathbf{M}E(\mathbf{e}_b^k[\mathbf{e}_b^k]^T)\mathbf{M}^T+E(\mathbf{e}_f[\mathbf{e}_f]^T)$$ $$\qquad = \mathbf{M}\mathbf{P}^k\mathbf{M}^T+\mathbf{Q}$$

KalmanFilter.png

Four stage process

  1. Use model to update analysis into forecast $$ \mathbf{x}^{k+1}_b = \mathbf{M}\mathbf{x}^k_a+\mathbf{f}_b$$
  2. Use model to update analysis error covariance into forecast error covariance $$\mathbf{B}^{k+1} = \mathbf{M}\mathbf{P}^k\mathbf{M}^T +\mathbf{Q}$$
  3. Use observation innovations to (optimally) update forecast into analysis, $$ \mathbf{x}^{k+1}_a = \mathbf{x}^{k+1}_b + \mathbf{B}^{k+1} \mathbf{H}^T\left( \mathbf{R}^{k+1} + \mathbf{H}\mathbf{B}^{k+1}\mathbf{H}^T\right)^{-1} (\mathbf{y}^{k+1}-\mathbf{H}\mathbf{x}^{k+1}_b).$$
  4. Calculate the analysis error covariance. $$\mathbf{P}^{k+1} = (\mathbf{I}-\mathbf{B^{k+1}}[\mathbf{H}^{k+1}]^T(\mathbf{R}^{k+1}+\mathbf{H}^{k+1}\mathbf{B}^{k+1}[\mathbf{H}^{k+1}]^T)^{-1}\mathbf{H}^{k+1})\mathbf{B}^{k+1}.$$

Good things:

  • Always have most appropriate $B$
  • Know error bounds
  • Most accurate approach #### Bad things:
  • Calculating new $B$ usually really costly.
  • Information only carried forwards, not back.
  • Linear $\mathbf{H}$ & $\mathbf{M}$ (can do a bit better here).
  • Kalman Filter only goes forwards
  • If we assume an error distribution, we can also carry information backwards.
  • Usually called Kalman smoother

KalmanSmoother.png

Gets expensive. We can try to do things cheaper.

Temporal Variational Assimilation: 4D-Var

  • 4D-Var is time-aware equivalent of 3D-Var method
  • Assuming observation and background errors fGaussian.
  • Look for initial state vector, $\mathbf{x}_0$, maximizing the joint probability density over "time window"
$$ p(\mathbf{x}_0\vert \mathbf{y}) \propto \exp\left(-\frac{(\mathbf{x}_b-\mathbf{x_0})^T\mathbf{B}^{-1}(\mathbf{x}_b-\mathbf{x}_0)+\sum_i(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i)))^T\mathbf{R}_i^{-1}(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i))}{2}\right),$$

subject to constraint that $\mathbf{x}_{i+1} = \mathbf{m}(\mathbf{x}_i, t_i)$.

4DVar.png

windowed_4DVar.png

  • PDE constrained optimization problem.
  • Solution through adjoint methods, as in parameter estimation.
  • Let's look briefly at the discrete case:

4D-Var as constrained optimization

Cost function to minimise ($-\log (p)$) is

$$\mathcal{J}_{ext}= \frac{(\mathbf{x}_b-\mathbf{x_0})^T\mathbf{B}^{-1}(\mathbf{x}_b-\mathbf{x}_0)+\sum_i(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i)))^T\mathbf{R}_i^{-1}(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i))}{2} + \sum_{i=0}^{N-1} \boldsymbol{\lambda}_i^T\left(\mathbf{x}_{i+1}-\mathbf{x}_i-\mathbf{m}(\mathbf{x}_i, t_i)\right)$$

where we are perturbing the $\mathbf{x}_0$.

Optimality conditions are:

$$\boldsymbol{\lambda_i}: \quad \mathbf{x}_{i+1}- \mathbf{x}_i -\mathbf{m}(\mathbf{x}_i, t_i) =\mathbf{0}$$$$\mathbf{x}_0: \quad \mathbf{B}^{-1}(\mathbf{x}_b-\mathbf{x}_0)+\mathbf{H}_0\mathbf{R}^{-1}(\mathbf{h}_0(\mathbf{x}_0)-\mathbf{y})-\boldsymbol{\lambda}_0=\mathbf{0}$$$$ \mathbf{x}_i: \quad\boldsymbol{\lambda}_{i-1}-\boldsymbol{\lambda}_{i}-\mathbf{M}_{i}^T\boldsymbol{\lambda}_i+\mathbf{H}_{i}^T\mathbf{R}^{-1}_{i}(\mathbf{h}_{i}(\mathbf{x}_{i})-\mathbf{y}_{i}))=\mathbf{0} $$$$ \mathbf{x}_N \quad-\boldsymbol{\lambda}_{N-1}+\mathbf{H}_{N}^T\mathbf{R}^{-1}_{N}(\mathbf{h}_{N}(\mathbf{x}_{N})-\mathbf{y}_{N}))=\mathbf{0} $$

where $\left[\mathbf{M}_k\right]_{ij} = \frac{\partial [\mathbf{m}_k(\mathbf{x}_k, t_k)]_i}{\partial [\mathbf{x}_k]_j}$ the linearization of the nonlinear forward model.

  • If model were linear, could start at $\boldsymbol{\lambda}_N$, solve forced adjoint backwards to $\boldsymbol{\lambda}_0$ and then find optimal $\mathbf{x}_0$.
  • Since it usually isn't, can iterate in a gradient based nonlinear sovler for $\mathbf{x}_0$ instead, as in lecture 7.
  • This is "just" PDE constrained optimization for our initial condition, with a cost/penalty function as the MLE (for Gaussian errors).

Good things:

  • Implicitly improves/updates $B$ over course of time window (see exercises).
  • uses all information in time window
  • Cheaper than equivalent Kalman filter #### Bad things:
  • New $B$ not saved.
  • Backing out individual error bounds significant extra work
  • Costlier that OI/3D-Var
  • Adjointing is hard, assumes model is differentiable.

Weak contraint 4D-Var

  • The version shown assumes that the model is perfect.
  • Can also work through the case where the model has errors with known covariance (as in the Kalman Filter.

New cost function is

$$\mathcal{J}_{ext}= \frac{(\mathbf{x}_b-\mathbf{x_0})^T\mathbf{B}^{-1}(\mathbf{x}_b-\mathbf{x}_0)+\sum_i(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i)))^T\mathbf{R}_i^{-1}(\mathbf{y}_i-\mathbf{h}_i(\mathbf{x}_i))}{2} + \sum_{i=0}^N \left(\mathbf{x}_{i+1}-\mathbf{x}_i-\mathbf{m}(\mathbf{x}_i, t_i)\right)^T\mathbf{Q}_i^{-1}\left(\mathbf{x}_{i+1}-\mathbf{x}_i-\mathbf{m}(\mathbf{x}_i, t_i)\right)$$

Extended Kalman Filter

  • We can also extend the Kalman Filter to the nonlinear case by linearizing the model appropriately:
  • Assume covariances update via linearized model, $M$ state by nonlinear model $m$.

You'll look at both of these in the exercises.

The Ensemble Kalman Filter( & friends)

  • Rather than calculating $B$ with statistics, can calculate for multiple states & use sample mean, covariance, etc. $$\overline{\mathbf{x}}_b^{k+1} = \frac{1}{N}\sum_{i=1}^{N}\mathbf{x}_a^(i)\approx \mathbf{\mu},$$ $$\mathcal{P}^{k+1}:= \frac{1}{N-1} \sum_{i=1}^N (\mathbf{x}_b^{(i)}-\overline{\mathbf{x}}_b)(\mathbf{x}^{(i)}-\overline{\mathbf{x}}_b)^T\approx \mathbf{P}.$$
  • Use OI approach (with $\mathcal{P}$) to update data in each member only.
  • Cheap (but sometimes inaccurate) with small sample set.
  • No need to modify model/ write down adjoint.
In [34]:
# trivial Ensemble Kalman filter for Lorenz attractor


# start by building a model
from scipy.integrate import odeint

rho = 28
sigma = 10
beta = 8/3

def f(X, t):
    """ Put the Forward Euler right hand side for Lorenz attractor"""
    
    x = X.reshape(3, len(X)//3)
    
    y = np.empty(x.shape)
    
    y[0, :] = sigma*(x[1, :]-x[0, :])
    y[1, :] = -x[0, :]*x[2, :] + rho*x[0, :]-x[1, :]
    y[2, :] = x[0, :]*x[1, :] - beta*x[2, :]
    
    return y.ravel()

x0 = np.ones(3)
t = np.linspace(0, 20, 1001)

x_t = odeint(f, x0, t).reshape(1001, 3)
In [35]:
fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(111, projection='3d')
ax.plot(*x_t[:,0:3].T)

y = x_t[100::100, 0] + np.random.normal(0, 0.1, 10)

H =np.zeros((1, 3))
H[0, 0] = 1

plt.plot(y, *x_t[100::100, 1:].T , 'ro');
In [36]:
N = 10

x_b0 = np.ones(3) + np.random.normal(0, 0.05, 3)
X_b0 = np.tile(x_b0, (N, 1)).T + np.random.normal(0, 0.05, (3, N))

X_b = np.empty([len(t), 3, N])


def C(X):
    """Estimate Covariance from ensemble."""
    Xmean = np.mean(X, axis=1)
    A = X - np.tile(Xmean, (N, 1)).T
    return A.dot(A.T)/(N-1)

X_b[:101, ...] = odeint(f, X_b0.ravel(), t[:101]).reshape(101, 3, N)
X_f = odeint(f, X_b0.ravel(), t).reshape(1001, 3, N)

# Integrate Kalman system, assimilating observations of $x$
# using ensemble Kalman filter.

for i in range(1, 10):
    # perturb observations
    Z = np.tile(y[i-1], (N, 1)).T - H.dot(X_b[i*100, :, :]) + np.random.normal(0, 0.1, (1, N))
    B = C(X_b[i*100, :, :])
    K = B.dot(H.T)*inv((H.dot(B.dot(H.T))+0.1**2))
    # assimilate
    X_a = X_b[i*100, :, :] + K.dot(Z)
    # integrate to next observation
    X_b[i*100:(i+1)*100+1, ...] = odeint(f, X_a.ravel(), t[i*100:(i+1)*100+1]).reshape(101, 3, N)

t_y=0
In [37]:
plt.figure(figsize=(15,5))
plt.plot(t, x_t[:, 0], 'k--')
plt.plot(t[100::100], y,'ro')
plt.plot(t, X_b[:, 0,],)
plt.ylabel('$x$')
plt.xlabel('time');
In [38]:
plt.figure(figsize=(15,5))
plt.plot(t, x_t[:, 0], 'k--')
plt.plot(t[100::100], y,'ro')
plt.plot(t, np.mean(X_b[:, 0, :], axis=1), label='assimilate x[0]')
plt.plot(t, np.mean(X_f[:, 0, :], axis=1), label='no assimilation')
plt.ylabel('$x$')
plt.xlabel('time')
plt.legend();
In [39]:
plt.figure(figsize=(15,5))
plt.plot(t, (abs(np.mean(X_b[:, 0, :], axis=1)-x_t[:, 0])), label='assimilating data')
plt.plot(t, (abs(np.mean(X_f[:, 0, :], axis=1)-x_t[:, 0])), label='no assimilation')
plt.ylabel('error in $x$')
plt.xlabel('time')
plt.legend()
Out[39]:
<matplotlib.legend.Legend at 0x2cc362fc970>

Modern operational assimilation

  • Often combines ensemble methods (to update $\mathbf{B}$) and variational approaches (for low cost).
  • Short term, high resolution deterministic forecasts tend to imply 4D-Var.
  • Long term, low resolution probabilistic forecasts tend to imply ensemble methods.

You should now:

  • Know a bit about probability
  • Be able to recognise
    • Optimal Interpolation
    • 3D-Var
    • standard Kalman Filter
    • strong constraint 4D-Var
  • Have some idea of their properties and usage.

Any questions?

In [ ]: